Implications of accounting for marker-based population structure in the quantitative genetic evaluation of genetic parameters related to growth and wood properties in Norway spruce

Background Forest geneticists typically use provenances to account for population differences in their improvement schemes; however, the historical records of the imported materials might not be very precise or well-aligned with the genetic clusters derived from advanced molecular techniques. The main objective of this study was to assess the impact of marker-based population structure on genetic parameter estimates related to growth and wood properties and their trade-offs in Norway spruce, by either incorporating it as a fixed effect (model-B) or excluding it entirely from the analysis (model-A). Results Our results indicate that models incorporating population structure significantly reduce estimates of additive genetic variance, resulting in substantial reduction of narrow-sense heritability. However, these models considerably improve prediction accuracies. This was particularly significant for growth and solid-wood properties, which showed to have the highest population genetic differentiation (QST) among the studied traits. Additionally, although the pattern of correlations remained similar across the models, their magnitude was slightly lower for models that included population structure as a fixed effect. This suggests that selection, consistently performed within populations, might be less affected by unfavourable genetic correlations compared to mass selection conducted without pedigree restrictions. Conclusion We conclude that the results of models properly accounting for population structure are more accurate and less biased compared to those neglecting this effect. This might have practical implications for breeders and forest managers where, decisions based on imprecise selections can pose a high risk to economic efficiency. Supplementary Information The online version contains supplementary material available at 10.1186/s12863-024-01241-x.


Background
It is well known that the geographic ranges of tree species have expanded and contracted several times during glacial and interglacial periods [1].The contraction phase led to the isolation between refugial area and substantial differentiation between populations [2].During the postglacial period, as climatic conditions successively improved, expansions formed the secondary contacts between the migrating fronts, resulting in introgression, and in Fennoscandia, establishment of steep latitudinal and longitudinal clines for adaptive traits [3,4].Indeed, the new biotic and abiotic factors encountered during the glaciation cycles, resulted in new adaptations [5].Individual phenotypes are thus the result of a dynamic interplay between evolutionary and demographic processes [6,7], which has resulted in the formation of specific tree species characteristics that may uniquely enable them to survive environmental changes.
Rapid changes in climatic conditions trigger new selective pressures on existing populations, which need to rapidly adapt to these changing conditions [8,9].Migration to more suitable habitats and phenotypic plasticity are regarded as short-term responses to these changes [10].Nevertheless, long-term responses require adaptive capacity, which depends on the level of genetic variation, upon which selection can act [11].Unravelling the factors underlying the adaptive traits is of particular interest for evolutionary biologists, forest tree breeders as well as conservation geneticists [12], and to successfully achieve this; one useful approach is to investigate how phenotypic variation is partitioned within and among populations [13].
Common garden testing of populations, so-called provenance tests, include broad genetic material representing a large proportion of a species' natural distribution.This provides a great opportunity to dissect genetic variation within a species and to determine the responses of populations to changing climate conditions through phenotypic plasticity and genetic selection to the changes [14].The presence of genetic variation within a population implies that the trait variation is heritable, while among population variation (e.g., Q ST ) can be the result of population demography.In general, these estimates provide important information on the ability of populations to respond to selection [15].For instance, narrowsense heritability is a key population parameter which is defined as the proportion of phenotypic variation that is due to additive genetic variance, upon which selection can act [11], and therefore is widely used in genetic improvement schemes [12].
Norway spruce (Picea abies (L.) H. Karst.) is a dominant conifer species of major economic importance in northern Europe.In Nordic countries, Norway spruce is an essential source of raw material for the forest-based industry, mainly used either for construction purposes or for pulp production [16].As such, the transfer and trade of Norway spruce seeds and seedlings from other regions and countries was extensive in Southern Sweden already during the 19 th century because foresters observed that such materials were more productive than local provenances.In the 1940s, a large-scale Norway spruce breeding program was initiated in Sweden, by selecting trees with superior phenotype (plus-trees) for first-generation seed orchards [17].And later, systematic genetic testing in the field confirmed that stands planted with materials of foreign origin indeed were more productive than the local provenances and appeared well adapted to the conditions in southern Sweden [18].Therefore, to increase the stand productivity and overcome seed deficits, large quantities of Norway spruce seeds from central and south-eastern Europe were imported to southern parts of Sweden both before and during the age of tree breeding [19].The geographic variation in the genetic make-up of introduced trees implies a strong population and admixture structure of base materials in the breeding program.Recently, genome-wide data analysis, utilizing the base populations of the Swedish breeding program, has revealed a more complex demographic history of the species, where three main genetic clusters had been described in Norway spruce: a northern domain in Fennoscandia and two southern domains in the Alps and Carpathians [20,21].Other studies of recurrent demographic migrations, followed by secondary interactions indicated a fragmentation of Norway spruce into seven genetic clusters [22][23][24]: Northern Fennoscandia (NFE), Central and South Sweden (CSE), Russia-Baltics (RusBal), Northern Poland (NPL), Central Europe (CEU), Alpine (ALP), and Carpathian (ROM) domains.In southern Sweden, a hybrid between CSE and ALP (CSE-ALP) has also recently been discovered [7].
A successful genetic improvement program depends on reliable estimates of additive genetic variance, narrow-sense heritability, and additive genetic correlations between traits [25].Since domestication of most of forest trees is still in its early stage, the understanding of the effect of geographical differences among the plus-trees greatly affects the results of genetic evaluations, and in turn, subsequent generations of breeding.Although forest geneticists have widely adopted provenances to account for population differences in their improvement schemes, the historical records of such imported materials might not be very precise or well-aligned with the genetic clusters derived from advanced molecular techniques.
Genetic parameters for growth and wood properties of Norway spruce have been extensively studied [16,26,27].However, the effect of genetic clusters (hereafter population structure) in genetic evaluations was not explicitly estimated.Therefore, the main objective of this study is to elucidate the effect of population structure, retrieved from genome-wide DNA markers, on genetic parameter estimates, fitted as a fixed effect or excluded using two mixed-linear model alternatives.More specifically, this paper addresses the following objectives: 1) to evaluate differences in phenotypic and genetic performances of Norway spruce populations for growth and wood properties 2) to quantify the level of population differentiation (Q ST ), narrow-sense heritability ( h 2 ), and additive genetic correlations (trade-offs) between growth and wood properties 3) To compare the predictive ability (PA) and prediction accuracy (ACC) of models using random cross validation.The analysis was done using a large dataset of 5,666 20-year-old trees from 524 half-sib families.Such investigation has practical implications for breeders and forest managers where, decisions based on imprecise selection of parameter estimates or wrong assumptions can pose a high risk to economic efficiency.

Comparison of phenotypic and genetic performances across populations
Two different mixed models (model-A and model-B) were tested to investigate the effect of population structure fitted as a fixed effect on genetic parameter estimates for different growth and wood properties.In models accounting for population structure (model-B), the effect of population structure was highly significant for all studied traits (p < 0.001), except for CELL and HEM (Table S1).
The performances of phenotypes and EBVs (using model-B only) across populations assessed for growth and solidwood properties are shown in Fig. 1.Such assessments for other studied traits are further included in supplementary materials (Fig. S1).Overall, populations showed greater differentiation based on the EBVs compared to the phenotypic values, suggesting that the variations among populations are primarily genetic rather than environmental.Nevertheless, in both performance categories, southern populations, such as ALP, CEU and NPL, were the most productive ones (higher RWT, HI7, DBH12, DBH21, TRadW, TTangW, MFA, and NUMRES).In contrast, their solid wood properties (DENS, MOE, TWTH) were lower than those of northern populations, such as CSE.In general, trees of CEU origin had the highest growth-related properties, while they had the lowest density-related properties.Such trends were particularly noticeable for EBVs.Trees of RusBal origin performed similarly to those of CSE origin, especially in terms of growth, whereas they had slightly lower DENS, FWT, FC and higher MOE (Fig. 1 and Fig. S1).No significant differences were observed among populations for chemical properties, except for LIG which was slightly higher for southern populations than CSE and RusBal (Fig. S1).
Generally, Q ST values were higher for growth traits with the greatest estimates obtained for DBH21 (~ 0.13 in Höreda and 0.20 in Erikstorp), which represent the highest genetic differentiation among populations, while Q ST values for chemical wood properties were nearly zero (Fig. 2).Q ST values for solidwood properties (0.03 to 0.05 in both trials) were generally higher than those observed for tracheid and resin properties with the greatest value obtained for TTWT and TTangW at ~ 0.03 (Fig. 2).

Quantitative genetic parameter estimates Narrow-sense heritability estimates
The most considerable difference between model-A and model-B, in terms of genetic parameter estimates, was the notable reduction of the additive genetic variance and the consequent reduction of narrow-sense heritabilities ( h 2 ) observed in model-B, particularly for growth (Tables 1 and 2).For example, in Höreda, the additive genetic variance estimates for DBH21 and RWT were 47% and 63% of the estimates derived from model-A, respectively.In Erikstorp, such proportions were 36% and 48%, respectively.Overall, h 2 was reduced by about 50% in Höreda and 65% in Erikstorp for DBH21.Similarly, h 2 for solidwood and some of tracheid properties were lower based on model-B.For instance, estimates for DENS, MOE, MFA, and TWTH decreased by 19.8%, 23.4%, 15.5%, and 17.5%, respectively in Höreda, while in Erikstorp, they decreased by 18.2%, 21.4%, 11.4%, and 15%, respectively.Nevertheless, reductions in h 2 based on model-B were negligible for resin properties and h 2 estimates for chemical properties were very similar in both models.

Additive genetic correlations (trade-offs) and genetic correlations between sites (G × E)
A subset of additive genetic correlations ( r a ) among dif- ferent traits, based on both models, are presented in Table 4, whereas the complete set is given in supplementary materials (Table S2).Additionally, the magnitude of r a was assessed separately for the two main popula- tions underlying this study (CSE and ALP) and a subset of their results are shown in Table 5.As expected, MOE and DENS were negatively associated with growth traits across the models, ranging from -0.66 (between MOE and DBH21) to -0.74 (between RWT and DENS) based on model-A.However, the magnitude of correlation estimates was lower based on model-B, ranging from -0.52 (between MOE and DBH21) and -0.62 (between RWT and DENS), corresponding to about 21% and 16% reduction of growth and wood properties trade-offs, compared to model-A (Table 4).Additionally, NUMRES and TRadW, which are positively associated with growth, showed negative genetic correlation with DENS, ranging from -0.50 to -0.65, respectively based on model-A, and from -0.42 to -0.58, respectively based on model-B.Additionally, r a between LIG and DENS was negative in both models, with a slight reduction (~ 3%) of the correlation based on model-B (Table 4).When the r a assessed for CSE and ALP separately, the most striking finding was the very high unfavourable correlation obtained between DBH21 and MOE for ALP (-0.83 ± 0.18) compared to CSE (-0.44 ± 0.12).Contrarily, r a between density and growth- related properties (i.e., between DENS and RWT) were less unfavourable in ALP (-0.54 ± 0.12), compared to CSE (-0.66 ± 0.06) (Table 5).
As expected, Type-B genetic correlations across sites ( r b ), representing the level of genotype by environment interaction (G × E) was low for wood properties, with r b ranging from 0.79 to 0.92 for solidwood and tracheid properties and from 0.91 to 0.99 for resin properties.Furthermore, r b across sites were very high for growth prop- erties in this study ( r b ~ 0.99 for HI, DBH12, and DBH21) based on both models, except for the lower correlation  S2).Overall, model-B resulted in a slight, but negligible, reduction of r b for the studied traits, compared to model-A.

Discussion
It has been demonstrated how population and family structure can adversely affect accuracies of genomic predictions and response to selections in practical plant breeding schemes [28].Likewise, incorporation of contemporary groups implemented directly in the pedigree or in the model as a fixed or random effect, was shown to improve model fitting and accuracy of breeding values in Douglas-fir [29].To our knowledge, this is the first quantitative genetic study to extensively examine the effect of marker-based population structure in quantitative genetic evaluations for unbiased estimation of genetic parameters, utilizing a large data set retrieved from the Swedish Norway spruce breeding program.
In the current study, significant differences in phenotypic and genetic performances of populations were observed, except for chemical properties.Additionally, across-population variation in estimated breeding values (EBVs) of individuals was notably higher than those observed for phenotypes, indicating differences among populations is due to genetic factors.Trees originating from southern latitudes, for instance those from CEU and ALP, were taller and larger than trees originating from northern latitudes, when grown in southern Sweden.Such result is in line with previous reports and the practical breeding guidelines for Norway spruce, which indicates that trees from southern origins outperform local provenances [18,30].Previously, it has been suggested that trees transferred to a higher latitude take advantage of longer photoperiods during growing season, and thereby have a higher growth rate than the local trees [31].Contrarily, trees originating from higher latitudes, such as CSE and RusBal had higher wood density, supporting the unfavourable association between low wood density and fast growth [32]; cheaper stem construction (low wood density) will increase the rate of leaf deployment per total mass increment and therefore growth rate in height and diameter [33].The amongpopulation genetic differentiation (Q ST ) for all studied traits was found to be highest for growth traits, especially at later ages (RWT, DBH12, DBH21), and secondly  for solid wood properties (DENS, MOE, PILOD, and MFA).This implies a high level of local adaptive potential in traits related to productivity [34].Nonetheless, Q ST values were much lower for tracheid and resin properties, and mostly negligible for chemical properties.This is in accordance with one recent study that investigated population differentiation and adaptation of Norway spruce for different traits by comparing Q ST and F ST [7].The authors showed that the macroscopic trait of stem diameter, in contrast to its microscopic components such as tracheid dimensions, is subject to divergent selection that dates back before the Last Glacial Maximum.Similarly, another study reported Q ST values being the highest for DBH in coastal Douglas-fir [29].The primary goal in every tree improvement program is to maximize genetic gain in economically important traits, which can be estimated as a product of heritability and selection differential [12].In our study, narrow sense heritability estimates ( h 2 ) obtained for growth and wood density were significantly larger based on the model ignoring the effect of population structure (model-A) than those obtained based on the model incorporating population effect (model-B).When partitioning the estimated genetic variance into its components, we have observed that such upward bias in model-A is mostly absorbed by the additive genetic variance.We reason that model-B allows dissection of the total genetic variation into components that reside within and among populations, whereas in model-A the additive genetic variation within and among-populations are confounded.This implies that failing to account for the among-population variation as a separate effect, particularly for traits having high Q ST , such as growth-related properties in the current investigation, can result in overestimation of the additive genetic variance.This ultimately leads to the biased prediction of the response to selection if the selection is carried out within populations or even within families.Nevertheless, when the focus is placed on selection of the best model, most of these decisions are made towards enhancement of predictive accuracy, which is a measure of the reliability of EBVs and is used to predict the response to selection [35].This accuracy can in practice be estimated by means of cross validations [36].Among others, random cross-validation is a common approach to assess efficiencies or accuracies of the estimated predictions in breeding schemes.Throughout the work we used 5-fold cross-validations to compare the models, measured as the correlation between predictions and observations.Model-B resulted in considerably higher predictive ability and prediction accuracy, particularly for growth and solidwood properties.It is noteworthy to mention that cross-validation results are consistent  for both sites, thus supporting the notion that the appropriate modelling of population structure in genetic evaluations has practical implications on the outcome of the selection process, and in turn, breeding program of Norway spruce.
Genetic correlations measure the level of relationship between two traits owing to genetic causes.Negative genetic correlations among desirable traits are often used as evidence for trade-offs and their investigation is important in understanding the evolutionary response of a trait as well as in designing effective breeding programs [37].One of the major causes of trade-offs is antagonistic pleiotropy, i.e., alleles that give rise to a high value for one trait and a low for the other.Another possible cause of trade-offs, although transient, is gametic phase linkage disequilibrium, which may occur when individuals from two populations with different gene frequencies intermate, as a side effect of recent directional selection or by biased or limited sampling [38].Nevertheless, genetic correlation is a complex population-specific genetic parameter, subject to be influenced by both allele frequency and environmental changes [37].
In line with previous reports in several coniferous species [39], additive genetic correlation estimates obtained for DENS and MOE with growth traits were negative across the models, indicating breeding for increased volume is achievable at the cost of decreased stem quality.Similarly, additive genetic correlation of lignin with density-related traits was unfavourable across the tested models, suggesting simultaneous improvement of trees for enhanced wood-quality and bioenergy production remains challenging in Norway spruce.Although the pattern of correlations was similar across the models, their magnitude was slightly lower in model-B, suggesting that selections consistently performed within populations may suffer less from the growth-wood quality trade-offs, compared to mass selections conducted without pedigree restrictions.More specifically, additive genetic correlations obtained in model-A are inflated as their variance is confounded with the variation residing among populations, and therefore, disentanglement of these effects in model-B resulted in the reduction of correlation estimates.
Our results similarly demonstrated that the magnitude of correlations can vary among populations.For instance, the unfavourable genetic correlation of growth with wood stiffness obtained for northern population (CSE) was only 50% of the one found for southern population (ALP), while the negative association of growth with wood density was weaker in ALP, compared to CSE.
Results of type-B genetic correlations revealed no evidence of genotype by environment interaction (G × E) for growth and wood properties underlying this study.However, type-B correlations were moderate to low for chemical properties, an indication of low stability in the performances of genotypes across the two sites for these traits.In general, there are various spectroscopic techniques, such as near-infrared spectroscopy (NIR), that can be successfully applied for rapid assessment of "hard-to-measure" chemical composition of wood [40].Although such techniques are highly advantageous, one of their drawbacks is that the predicted relationship between wavelengths and the property of interest are based only on a subset of samples, derived from wet chemistry analyses [41] like the technique used in the current study.Correspondingly, the poor performance of chemical properties, in terms of non-significant differences observed across populations, very low Q ST estimates, and high levels of G × E obtained in our study could be due to some extra noise affected prediction of the chemical properties.Future investigations should examine other spectroscopical methods along with different prediction and calibration models to determine whether the non-significant differences observed among populations for chemical properties has a biological or technical underlying factor.

Conclusions
In the current study, we examined the impact of markerbased population structure on genetic parameter estimates, utilizing a large data set retrieved from the breeding program of Norway spruce in Sweden, through two alternative models.Our findings show that there is a substantial genetic variation among populations, in terms of growth and wood properties.The model which accounts for population structure as a separate effect (model-B) results in substantial reduction of additive genetic variance, and subsequently, reduction of narrow-sense heritability estimates.However, prediction accuracies obtained based on this model were considerably higher than the alternative model (model-A).This was particularly significant for growth and solid-wood properties, which showed to have the highest population genetic differentiation (Q ST ) among the studied traits.Although the adverse genetic correlation between growth and wood properties remains as a constraint, the magnitude of correlations was slightly lower in model-B.This suggests that selections consistently performed within populations may suffer less from the growthwood quality trade-offs, compared to mass selections conducted without restrictions in terms of populationpedigree.Along with the lower accuracies obtained based on model-A, we may conclude that results of models neglecting effect of population structure are inaccurate and biased as the variation residing among populations is confounded with the additive genetic variance.This might have practical implications for future Norway spruce breeding program and potentially for other species when the breeding materials have heterogenous background.

Experimental materials
This study utilized data from two large open-pollinated progeny trials of Norway spruce: S21F9021146 (Höreda) (57.61°N 15.04°E) and S21F9021147 (Erikstorp) (55.90°N 13.93°E), located in southern Sweden.Both trials were established in the spring of 1990 by the Forestry Research Institute of Sweden (Skogforsk) and are a part of the breeding program of Norway spruce in southern Sweden.The genetic material in each trial originates from open-pollinated seeds collected from standing plustrees (trees with outstanding phenotype) within 112 stands.Each experiment has a randomized incomplete block with single-tree plot design, divided into 20 and 23 blocks, comprised of 1,373 and 1,375 half-sib families, in S21F9021146 and S21F9021147, respectively.More detailed information about trial characteristics is found in [26].

Phenotype measurements
Eighteen traits related to growth, wood quality (solid and tracheid properties), chemical composition and resin canal properties of wood were assessed in this study.Tree diameter at breast height (1.3 m above ground) was measured at ages 12 and 21 years (DBH12 and DBH21 [mm], respectively) and tree height was measured at age seven (HI7 [cm]).Trees were also assessed for pilodyn penetration at age 22 years (PILOD [mm]), which offers an indirect measure of wood density [42].A complete list of the measured traits, their abbreviations, and number of individuals and families representing them, is shown in Table 6.

SilviScan measurements
In 2010 and 2011, two 12-mm bark-to-pith increment cores were collected for analyses of radial variations of different traits at breast height from 5,666 trees, aged 20-21 years, representing 524 half-sib families.The cores were drilled from the northern side of the stems, in parallel and close to each other to allow joint evaluations of property data origination from the two cores.The same core was used for radial analyses of growth, solid, tracheid and chemical properties, all performed at Innventia, now RISE (Stockholm, Sweden).
High-resolution data were acquired with a SilviScan instrument [43]   latewood (LW), were identified from the wood density variation within each ring [44].
Because area-weighted values (AWV) more accurately represent the average properties of the wood the AWV for solid and tracheid properties was calculated and used in this study [45].

Chemical wood properties
Models for estimation of concentrations of wood chemical components (lignin (LIG), cellulose (CELL), and hemicelluloses (HEM) [%]) were developed from wood sampled from trees of similar origin.A set of 40 annual rings selected with the aim of covering as much chemical variability in the wood as possible in the chemical concentrations was identified and cut out from the discs with one longitudinal x radial surface of same orientation as the sides of the SilviScan sample strips.These sides, facing the side of the tracheid, were scanned with a hyperspectral imaging near infrared camera (NIR, 960 -2500 nm), providing spectra with 6 nm spectral resolution, and mean spectra for each sample (ring) were calculated.The samples were then analysed with chemical reference methods at MoRe Research (Örnsköldsvik, Sweden).The carbohydrates and lignin content for all samples were analysed by the SCAN-CM 71:09 and Tappi T222 methods [46], respectively.With the SCAN-CM 71:09 method, the different sugar monomers were obtained and from the monomer content, the percentage of cellulose and hemicellulose were quantified, following the formula developed for softwoods [47].The chemical ring mean data were associated with corresponding NIR spectra and partial least squares multivariate models for estimation of the chemical concentrations were created and validated with the data.The SilviScan samples were then polished on the corresponding sides and scanned with the same NIR-camera at 30 µm radial resolution, and the variation in chemical composition was predicted for all samples.The chemical composition at each point in the sample was spatially matched to the physical characteristics as determined by SilviScan, using an algorithm designed for the purpose, which allowed calculating ring-mean averages of the chemical composition.The coefficient of determination (R 2 ) and root mean square error of cross-validation (RMSEcv) for each model are given in Table S3.

Resin wood properties
Identification of the resin properties of wood was done using a learning-based model approach.Commonly, about 60 individual microscopy images are needed to cover a single SilviScan sample.A random sample of 104 high-resolution images obtained from different Sil-viScan analysis, were used to annotate, train, and test a neural network model to count the total number of resin canals (NUMRES) in each single microscopy image and the area of every resin canal, using the STARDIST method [48].To validate the performance of the model, it was deployed to predict the number of resin canals in 1634 individual microscopy images, corresponding to 25 SilviScan samples that were not used for training.The actual number of resin canals in these 25 samples was also manually counted and compared with the model's predictions, giving a coefficient of determination (R 2 ) of 0.8905.The model was then applied to the samples in this study.Subsequently, resin canal density (CANDENS [canal/cm 2 ]) of each sample was determined as NUMRES per unit area, while the average crosscut area of the resin canals (AVCAREA [μm 2 ]) was determined by dividing the total area of the resin canals by NUMRES.

Population structure based on DNA markers
Population structure of the individuals derived from SNP genotyping of the 518 mother trees.The analysis of population structure was applied on 399,801 noncoding SNPs with significantly linked sites removed (pairwise LD ≥ 0.2 and FDR value ≤ 0.05).EIGENSOFT v6.1.4was used to perform principal component analysis (PCA) on the reduced set of independent SNPs.Individuals of known origin were first grouped into seven major clusters.These individuals were then used as a training set in a 'Random Forest' regression model.The first five components of the PCA analysis were used for model fitting to classify the unknown individuals into each of the seven clusters.Fivefold cross-validation was performed for error estimation [6,22].The half-sib families are categorized into seven genetic clusters as follows: Central and South Sweden (CSE), Russia-Baltics (RusBal), Northern Poland (NPL), Central Europe (CEU), Alpine (ALP), Carpathians (ROM), and hybrids between CSE and ALP (CSE-ALP).The individuals belonging to the ROM cluster were excluded from the analysis, because this cluster was represented only by three half-sib families, whereas all other clusters comprised at least 10 families (Table S4).

Quantitative genetic parameters
Population structure can be fitted as either fixed or random effect in genetic evaluations using mixed-linear models [49].In the current study, population structure was considered as a fixed term, although alternative attempts to model it as random term or as an intrinsic part of the pedigree yielded similar results (data not presented).The effect of population structure on estimated genetic parameters as well as estimated breeding values (EBVs) was investigated through two alternative models 1) population structure excluded from the model entirely (model-A) 2) all populations included in the model (model-B).Later, the accuracy and predictive performance of these two models were compared using a k-fold cross-validation based on all studied traits.
Variance and covariance components for the studied traits were estimated using univariate and bivariate mixed linear models implemented in the ASReml-R statistical package [50].The fit of different models was evaluated using the Akaike Information Criteria (AIC) and loglikelihood estimates and the optimal model was selected based on a compromise of model fit and complexity.The following pedigree-based linear mixed (animal) model for jointsite analysis of model-A and model-B was fitted, with the only difference that in model-A the effect of population structure was dropped from the model.where Y is the vector of observations on tree m from genetic cluster (population) l in block k at the site j , µ is overall mean,P l ,S j , and B k(j) are the fixed effects of popu- lationl , site j , and block k within the site j , respectively.The variables G m(l) and SG jm(l) are the random additive genetic effects of individual m within populationl , and the random interactive effect of the site j and the individ- ual m within the populationl , respectively, and e jklm is the random residual effect.Preliminary analyses indicated there was no significant population-by-site ( SP jl ) effect for all traits, except for DBH12 and PILOD.Therefore, this effect was omitted from the model.
For the analysis and estimation of type-B genetic correlation between sites, heterogeneous additive genetic variance ∼ MVN (0, G ⊗ I) and heterogenous error vari- ance∼ MVN (0, R ⊗ I) were included in the model.
where σ 2 ai and σ ai are the additive genetic variances and covariances, respectively; σ 2 ei is the error variance for each site; I is the identity matrix equal to the number of observations at each site and 0 indicates no site-site error covariance.
Breeding values of individuals for model-B were calculated using genetic effect estimates obtained from equation one, as follows: where µ, G, and P are predicted mean, solutions of indi- vidual tree and population effects, respectively.However, for model-A, the effect of population was not included.
For estimating type-A genetic correlation between different traits, bivariate analysis was conducted featuring a similar model setup as in Eq. 1, except that the genetic variance was considered as homogenous.
The additive genetic correlations between traits a 1 and a 2 ( r (a) ), using variance and covariance components from the bivariate analysis, were calculated as: Type-B genetic correlations [51] of additive effects across sites, were calculated as follows: where σ a1,2 is the covariance between the additive effects of the same trait at different sites; σ 2 a1 and σ 2 a2 are estimated additive variances for the same traits at different sites.Standard errors for variance components and genetic parameters were estimated using the Taylor series expansion method.
Population differentiation of quantitative traits, represented by Q st index [52], was estimated following the mixed model (Eq.1), except that effect of genetic cluster was considered as random and Q st was calculated as: where σ 2 G is the variance of genetic cluster and other terms were defined above in Eq. 3.

Cross-validation for comparison of model performance
In this work we used a five-fold cross validation for all traits to compare predictive performance of the two model alternatives (model-A and model-B).The procedure consisted of dividing the dataset into five random folds of approximately equal size.Data in four folds were used for training the model (model development) and prediction of phenotypes in the one remaining fold (the testing fold) that has the phenotypes set to missing.The prediction process was repeated five times until each fold had been used for once as test set.The performance of the models was assessed by predictive ability and prediction accuracy.Predictive ability (PA) was evaluated as the mean Pearson correlation of (3) estimated breeding values (EBVs) of the individuals from the five replications with their observed phenotype ( y ).i.e., PA = cor EBVs, y .Standard error of the correlations was computed using the following equation: where σ is the standard deviation of the Pearson correla- tions and n is the number of replicates.Additionally, ACC was estimated as the PA scaled by average square root of heritabilities obtained from each replication.i.e., ACC = PA h 2 i .

Table 1
Estimated genetic parameters for growth and wood properties in trial Höreda based on two mixed-model approaches (model-A and model-B) excluding and including a fixed effect of population structure, respectively σ 2 A additive genetic variance, σ 2 e residual variance, h 2 i narrow-sense heritability estimates (standard error of estimates in parenthesis) obtained for RWT ( r b =0.77 and 0.60) based on model- A and model-B, respectively.However, genetic correlations between sites for chemical properties were much lower, ( r b ~ 0.45 for CELL and HEM, and 0.72 for LIG) (Table

Table 2
Estimated genetic parameters for growth and wood properties in trial Erikstorp based on two mixed-model approaches (model-A and model-B) excluding and including a fixed effect of population structure, respectively σ 2A additive genetic variance, σ 2 e residual variance, h 2 i narrow-sense heritability estimates (standard error of estimates in parentheses)

Table 3
Trait predictive ability (PA) and prediction accuracy (ACC) based on model-A and model-B in two trials Höreda and Erikstorp.(standard errors of the estimates in the parentheses)

Table 4
Additive genetic correlation estimates among growth and wood properties based on model-A and model-B.(Standard errors of the estimates in the parentheses)

Table 5
Additive genetic correlation estimates among growth and wood properties based on northern (CSE) and southern (ALP) populations.(Standard errors of the estimates in the parentheses)

Table 6
Summary of traits measured for this study